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The curve time series framework provides a convenient vehicle to 
accommodate some nonstationary features into a stationary setup. 
We propose a new method to identify the dimensionality of curve time 
series based on the dynamical dependence across different curves. The 
practical implementation of our method boils down to an eigenanaly- 
sis of a finite-dimensional matrix. Furthermore, the determination of 
the dimensionality is equivalent to the identification of the nonzero 
eigenvalues of the matrix, which we carry out in terms of some boot- 
strap tests. Asymptotic properties of the proposed method are inves- 
tigated. In particular, our estimators for zero-eigenvalues enjoy the 
fast convergence rate n while the estimators for nonzero eigenval- 
ues converge at the standard y'n-rate. The proposed methodology is 
illustrated with both simulated and real data sets. 



1. Introduction. A curve time series may consist of, for example, annual 
weather record charts, annual production charts or daily volatility curves 
(from morning to evening). In these examples, the curves are segments of 
a single long time series. One advantage to view them as a curve series is 
to accommodate some nonstationary features (such as seasonal cycles or 
diurnal volatility patterns) into a stationary framework in a Hilbert space. 
There are other types of curve series that cannot be pieced together into a 
single long time series; for example, daily mean-variance efficient frontiers of 
portfolios, yield curves and intraday asset return distributions. See also an 
example of daily return density curves in Section 4.2. The goal of this paper 
is to identify the finite dimensionality of curve time series in the sense that 
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the serial dependence across different curves is driven by a finite number of 
scalar components. Therefore, the problem of modeling curve dynamics is 
reduced to that of modeling a finite-dimensional vector time series. 

Throughout this paper, we assume that the observed curve time series, 
which we denote by Y\ (•),..., Y n (-), are defined on a compact interval X and 
are subject to errors in the sense that 

(1.1) Y t (u)=X t (u)+s t (u), uel, 

where Xt(-) is the curve process of interest. The existence of the noise term 
Et(-) reflects the fact that curves Xt(-) are seldom perfectly observed. They 
are often only recorded on discrete grids and are subject to both experimen- 
tal error and numerical rounding. These noisy discrete data are smoothed 
to yield "observed" curves Ij(-). Note that both Xt(-) and £t(-) are unob- 
servable. 

We assume that £t(-) is a white noise sequence in the sense that E{e t (u)} = 

for all t and Cov{£t(u) , £ s (v)} = for any u,v £ I provided t^s. This is 
guaranteed since we may include all the dynamic elements of Yt{) into Xt(-). 
Likewise, we may also assume that no parts of Xt(-) are white noise since 
these parts should be absorbed into £*(•)• We also assume that 

(1.2) J E{X t (u) 2 +£ t (u) 2 }du<oo, 
and both 

(1.3) fi(u) = E{X t {u)}, M k (u,v) = Cov{X t (u),X t+k (v)} 

do not depend on t. Furthermore, we assume that X t {-) and £*+&(■) are un- 
corrected for all integer k. Under condition (1.2), Xt(-) admits the Karhunen- 
Loeve expansion 

oo 

(1.4) X t (u)- f i(u) = J2^ J (u), 

3=1 

where £tj = fxi-^t( u ) ~ fi(u)}ipj(u) du with {S,tj,j > 1} being a sequence of 
scalar random variables with E(^ t j) = 0, Var(£ t j) = Xj and Cov (£ t i,£,tj) = 
if i ^ j. We rank {£tj,j ^ 1} such that Xj is monotonically decreasing as j 
increases. 

We say that Xt(-) is d-dimensional if ^ and A^ + i = 0, where d > 

1 is a finite integer; see Hall and Vial (2006). The primary goal of this 
paper is to identify d and to estimate the dynamic space M. spanned by the 
(deterministic) eigenfunctions y?i(-), . . . , y?d(-)- 

Hall and Vial (2006) tackle this problem under the assumption that the 
curves Yi(-), • • • , Y n (-) are independent. Then the problem is insoluble in the 
sense that one cannot separate Xt(-) from £((•) in (1.1). This difficulty was 
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resolved in Hall and Vial (2006) under a "low noise" setting which assumes 
that the noise ej(-) goes to as the sample size goes to infinity. Our approach 
is different and it does not require the "low noise" condition, since we identify 
d and M in terms of the serial dependence of the curves. Our method relies 
on a simple fact that M^(u,v) = Cov{Yt(u), Y t+ k(v)} for any k ^ 0, which 
automatically filters out the noise £*(■); see (1.3). In this sense, the existence 
of dynamic dependence across different curves makes the problem tractable. 

Dimension reduction plays an important role in functional data analysis. 
The most frequently used method is the functional principal component 
analysis in the form of applying the Karhunen-Loeve decomposition di- 
rectly to the observed curves. The literature in this field is vast and includes 
Besse and Ramsay (1986), Dauxois, Pousse and Romain (1982), Ramsay 
and Dalzell (1991), Rice and Silverman (1991) and Ramsay and Silverman 
(2005). In spite of the methodological advancements with independent ob- 
servations, the work on functional time series has been of a more theoretical 
nature; see, for example, Bosq (2000). The available inference methods focus 
mostly on nonparametric estimation for some characteristics of functional 
series [Part IV of Ferraty and Vieu (2006)]. As far as we are aware, the 
work presented here represents the first attempt on the dimension reduction 
based on dynamic dependence, which is radically different from the exist- 
ing methods. Heuristically, our approach differs from functional principal 
components analysis in one fundamental manner; in principal component 
analysis the objective is to find the linear combinations of the data which 
maximize variance. In contrast, we seek for the linear combinations of the 
data which represent the serial dependence in the data. Although we confine 
ourselves to square integrable curve series in this paper, the methodology 
may be extended to a more general functional framework including, for ex- 
ample, a surface series which is particularly important for environmental 
study; see, for example, Guillas and Lai (2010). A follow-up study in this 
direction will be reported elsewhere. 

The rest of the paper is organized as follows. Section 2 introduces the 
proposed new methodology for identifying the finite-dimensional dynamic 
structure. Although the Karhunen-Loeve decomposition (1.4) serves as a 
starting point, we do not seek for such a decomposition explicitly. Instead 
the eigenanalysis is performed on a positive-definite operator defined based 
on the autocovariance function of the curve process. Furthermore, compu- 
tationally our method boils down to an eigenanalysis of a finite matrix thus 
requiring no computing of eigenfunctions in a functional space directly. The 
relevant theoretical results are presented in Section 3. As our estimation for 
the eigenvalues are essentially quadratic, the convergence rate of the esti- 
mators for the zero-eigenvalues is n while that for the nonzero eigenvalues is 
standard y/n. Numerical illustration using both simulated and real datasets 
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is provided in Section 4. Given the nature of the subject concerned, it is in- 
evitable to make use of some operator theory in a Hilbert space. We collect 
some relevant facts in Appendix A. We relegate all the technical proofs to 
Appendix B. 

2. Methodology. 

2.1. Characterize d and Ai via serial dependence. Let £2(2) denote the 
Hilbert space consisting of all the square integrable curves defined on I 
equipped with the inner product 

(2.1) {f,g)=Jf(v)g(u)du, f,geC 2 {X). 

Now Mk defined in (1.3) may be viewed as the kernel of a linear opera- 
tor acting on £2(^)1 that is, for any g E £2(^)5 Mk maps g(u) to g{u) = 
J x Mk(u,v)g(v) dv . For notational economy, we will use Mk to denote both 
the kernel and the operator. Appendix A lists some relevant facts about 
operators in Hilbert spaces. 

For Mo defined in (1.3), we have a spectral decomposition of the form 

00 

(2.2) Mp(u, v) = \jipj(u)ipj(v), u,v£l, 

3=1 

where Ai > A2 > • • • > are the eigenvalues and <px,<p2,--- are the corre- 
sponding orthonormal eigenfunctions (i.e., (<pi,(fj) = 1 for i = j, and oth- 
erwise). Hence, 

Mo(u, v)ipj(v) dv = Xj(pj(u), j > 1. 

Furthermore, the random curves Xt(-) admit the representation (1.4). We 
assume in this paper that X t (-) is d-dimensional (i.e., A^ + i = 0). Therefore, 

d d 

(2.3) M (u,v) = '^\jip j (u)ipj(v), X t (u) ={j,(u) + ^2 Ctjfj (u) . 

3=1 i=i 
It follows from (1.1) that 

d 

(2.4) Y t {u) = + e t (u). 

Thus, the serial dependence of It(-) is determined entirely by that of the 
d-vector process £ t = (£ti, £td)' since St(-) is white noise. By the virtue of 
the Karhunen-Loeve decomposition, E£ t = and Var(£ t ) = diag(Ai, . . . , A^). 
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For some prescribed integer p, let 

n-p 

(2.5) M k (u,v) = - Y(u)}{Y J+k (v) - Y(v)}, 

F 3=1 

where Y(-) = n _1 ^i<j<n ^j'(') an d = The reason for truncat- 

ing the sums in (2.5) at n — p as opposed to n — k is to ensure a dual- 
ity operation which simplifies the computation for eigenfunctions; see Re- 
mark 2 at the end of Section 2.2.2. The conventional approach to esti- 
mate d and M. = span{y>i (•),..., </?d(-)} is to perform an eigenanalysis on 
Mq and let d be the number of nonzero eigenvalues and A4 be spanned 
by the d corresponding eigenfunctions; see, for example, Ramsay and Sil- 
verman (2005) and references therein. However, this approach suffers from 
complications due to fact that Mq is not a consistent estimator for Mo, as 
Cov{Y t (u),Y t (v)} = M (u,v) + Cov{e t {u),e t (v)}. Therefore, M needs to be 
adjusted to remove the part due to Et(-) before the eigenanalysis may be 
performed. Unfortunately, this is a nontrivial matter since both Xt(-) and 
Et(-) are unobservable. An alternative is to let the variance of £t(-) decay to 
as the sample size n goes to infinity; see Hall and Vial (2006). 

We adopt a different approach based on the fact that Cov{Yf (u), Y t+k (v)} = 
M k (u,v) for any k ^ 0, which ensures that M k is a legitimate estimator for 
M k - see (1.3) and (2.5). 

Let Sfc = E(£ t £' t+k ) = (cj^ ) be the autocovariance matrix of £ t at lag k. 

It is easy to see from (1.3) and (2.3) that M k (u,v) = Ylfj=i a ij ^ t Pi( u ) l Pj{ v )- 
Define a nonnegative operator 

d 

(2.6) N k {u,v) = I M k {u,z)M k (v,z)dz = ^ w$<Pi(u)<Pj(v), 

where = (w^') = S^S^, is a nonnegative definite matrix. Then it holds 
for any integer k that 

(2.7) J N k (u,v)((v)dv = for any £(■) £ A4^, 

where M 1 - denotes the orthogonal complement of A4 in £2(1). Note (2.7) 
also holds if we replace N k by the operator 

v 

(2.8) K(u,v) = Y,NkM, 

k=i 

which is also a nonnegative operator on L<i(X)- 
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Proposition 1. Let the matrix E/t be full-ranked for some ko > 1- 
Then the assertions below hold. 

(i) The operator N ko has exactly d nonzero eigenvalues, and Ai is the 
linear space spanned by the corresponding d eigenf unctions. 

(ii) For p>kQ, (i) also holds for the operator K. 

Remark 1. (i) The condition that rank(Sfc) = d for some k > 1 is im- 
plied by the assumption that Xt(-) is (i-dimensional. In the case where 
rank(Sfc) < d for all k, the component with no serial correlations in Xt(-) 
should be absorbed into white noise £*(■); see similar arguments on modeling 
vector time series in Peha and Box (1987) and Pan and Yao (2008). 

(ii) The introduction of the operator K in (2.8) is to pull together the 
information at different lags. Using single N k may lead to spurious choices 
of d. 

(hi) Note that j x K(u,v)((v) dv = if and only if J x N k (u,v)Q(v) dv = 
for all 1 < k < p. However, we cannot use M k directly in defining K since 
it does not necessarily hold that j x J2i<k<pMk(u,v)g(v) / for all g £ M.. 
This is due to the fact that M k are not nonnegative definite operators. 

2.2. Estimation of d and M. 

2.2.1. Estimators and fitted dynamic models. Let ifti, . . . be the or- 
thonormal eigenfunctions of K corresponding to its d nonzero eigenvalues. 
Then they form an orthonormal basis of M; see Proposition l(ii) above. 
Hence, it holds that 

d d 

X t {u) - fi(u) =^2&j<Pj{u) =^2r]tjijjj(u), 
j=i j=i 

where rj t j = f x {X t (u) — fi(u)}ipj(u) du. Therefore, the serial dependence of 
X t (-) [and also that of Y t (-)] can be represented by that of the d-vector 
process rj t = (r] t i, . . . ,??td)'. Since {Ctj,^j) cannot be estimated directly from 
Y t (see Section 2.1 above), we estimate (r)tj,tpj) instead. 

As we have stated above, Mj, for k ^ may be directly estimated from 
the observed curves Yt, see (2.5). Hence, a natural estimator for K may be 
defined as 

p r — _ 
K(u,v)=J" / M k (u,z)M k (v,z)dz 

n-p p 

(2-9) = j— — 2 £ £i y * (u) - Y(u)}{Y s {v) - Y(v)} 

^ P> t,s=l k=l 

x (Y t+k -Y,Y s+k -Y), 

see (2.8), (2.6), (2.5) and (2.1). 
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By Proposition 1, we define d to be the number of nonzero eigenvalues 
of K (see Section 2.2.3 below) and Ai to be the linear space spanned by 
the d corresponding orthonormal eigenfunctions ipi(-), ■ ■ ■ ,ip^{-). This leads 
to the fitting 

d 

(2.10) Y t (u)=Y(u) + J2vt j $j(u), uel, 

j'=i 

where 

(2.11) rjtj = J^{Y t (u) — Y(u)}^j(u) du, j = l,...,d. 

Although ipj are not the estimators for the eigenfunctions (fj of Mq de- 
fined in (2.2), M. = span{^x('); • • • t^Pdi')} is a consistent estimator of M. = 
span{<^i(-), . . . , </3<f(-)} (Theorem 2 in Section 3 below). 

In order to model the dynamic behavior of Yt(-), we only need to model 
the d-dimensional vector process rj t = (rfti, ■ ■ . ,rf t g)'', see (2.10) above. This 
may be done using VARMA or any other multivariate time series models. 
See also Tiao and Tsay (1989) for applying linear transformations in order 
to obtain a more parsimonious model for rj t . 

The integer p used in (2.5) may be selected in the same spirit as the 
maximum lag used in, for example, the Ljung-Box-Pierce portmanteau test 
for white noise. In practice, we often choose p to be a small positive integer. 
Note that fco fulfilling the condition of Proposition 1 is often small since 
serial dependence decays as the lag increases for most practical data. 

2.2.2. Eigenanalysis. To perform an eigenanalysis in a Hilbert space is 
not a trivial matter. A popular pragmatic approach is to use an approxi- 
mation via discretization, that is, to evaluate the observed curves at a fine 
grid and to replace the observed curves by the resulting vectors. This is an 
approximate method; effectively transform the problem to an eigenanalysis 
for a finite matrix. See, for example, Section 8.4 of Ramsay and Silverman 
(2005). Below we also transform the problem into an eigenanalysis of a fi- 
nite matrix but not via any approximations. Instead we make use of the 
well-known duality property that AB' and B'A share the same nonzero 
eigenvalues for any matrices A and B of the same sizes. Furthermore, if 
-f is an eigenvector of B'A, A 7 is an eigenvector of AB' with the same 
eigenvalue. In fact, this duality also holds for operators in a Hilbert space. 
This scheme was adopted in Kneip and Utikal (2001) and Benko, Hardle 
and Kneip (2009). 

We present a heuristic argument first. To view the operator K(-,-) de- 
fined in (2.9) in the form of AB', let us denote the curve It(-) — Y(-) as 
an 00 x 1 vector Y t with Y' t Y s = (Y t -Y,Y S - Y); see (2.1). Put 34 = 
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(Y 1+ fc, . . . , Y n _p + k). Then K(-, •) may be represented as an oo x oo matrix 

1 P 
(n-p) 2 

Applying the duality with A = 3^o and B' = X^i<fe<r>^ifc^fe^O' ^ shares the 
same nonzero eigenvalues with the (n — p) x (n — p) matrix 

(2.12) K ' = (^tWo, 

where the (t, s)th element of 3^.34 is Yj +fc Y s+fc = (Yt+fc - Y, Y s+k - Y) and 
k = 0, 1, . . . ,p. Furthermore, let -jj = (71^, . . . ,7 n _ PJ )', j = 1, . . . ,d, be the 
eigenvectors of K* corresponding to the d largest eigenvalues. Then 

n—p 

(2.13) J>;TO) -?(•)}> J = l,-.-,d, 
t=i 

are the d eigenfunctions of K(-,-). Note that the functions in (2.13) may 
not be orthogonal with each other. Thus, the orthonormal eigenfunctions 
ipi(-), ■ ■ ■ ,Tpj(-) used in (2.10) may be obtained by applying a Gram-Schmidt 
algorithm to the functions given in (2.13). 

The heuristic argument presented above is justified by result below. The 
formal proof is relegated to Appendix B. 

Proposition 2. The operator K(-, ■) shares the same nonzero eigenval- 
ues with matrix K* defined in (2.12) with the corresponding eigenfunctions 
given in (2.13). 

Remark 2. The truncation of the sums in (2.5) at (n — p) for different 
k is necessary to ensure the applicability of the above duality operation. If 
we truncated the sum for M k at (n — k) instead, 3^34 would be of different 
sizes for different k, and K* in (2.12) would not be well defined. 

2.2.3. Determination of d via statistical tests. Although the number of 
nonzero eigenvalues of operator K(-, ■) defined in (2.8) is d [Proposition l(ii)], 
the number of nonzero eigenvalues of its estimator K(-,-) defined in (2.9) 
may be much greater than d due to random fluctuation in the sample. One 
empirical approach is to take d to be the number of "large" eigenvalues of 
K in the sense that the (d + l)th largest eigenvalue drops significantly; see 
also Theorem 3 in Section 3 and Figure 1 in Section 4.1. Hyndman and 
Ullah (2007) proposed to choose d by minimizing forecasting errors. Below, 
we present a bootstrap test to determine the value of d. 
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Let 9\ > 02 > • • • > be the eigenvalues of K. If the true dimensionality is 
d = do, we expect to reject the null hypothesis 6d = 0, and not to reject the 
hypothesis 6>d +i = 0. Suppose we are interested in testing the null hypothesis 

(2.14) H : 9 do+1 = 0, 

where do is a known integer, obtained, for example, by visual observation 
of the estimated eigenvalues 9\ > 62 > • • • > of K. Hence, we reject Ho if 
^do+i > l a -i where l a is the critical value at the a £ (0, 1) significance level. To 
evaluate the critical value l a , we propose the following bootstrap procedure. 

1. Let Y t (-) be defined as in (2.10) with d = d . Let e t (-) = Y t (-) - %{■). 

2. Generate a bootstrap sample from the model 

iro =$(■) +«*(•), 

where e* are drawn independently (with replacement) from . . . ,e n }. 

3. Form an operator K* in the same manner as K with replaced by 
{Y"/}, compute the (do + l)th largest eigenvalue 9*, +1 °^ ^* ' 

Then the conditional distribution of 9* d +l , given the observations {Yi, . . . , Y n }, 

is taken as the distribution of 9d +i under Ho- In practical implementation, 
we repeat Steps 2 and 3 above B times for some large integer B, and we 
reject Ho if the event that 9* d +l ^ ®d +i occurs not more than [aB] times. 
The simulation results reported in Section 4.1 below indicate that the above 
bootstrap method works well. 

Remark 3. The serial dependence in Xt could provide an alternative 
method for testing hypothesis (2.14). Under model (2.4), the projected se- 
ries of the curves Yt(-) on any direction perpendicular to M is white noise. 
Put Uf = (If, ipdo+i),t = 1, ... ,n. Then Ut would behave like a (scalar) white 
noise under Ho- However, for example, the Ljung-Box-Pierce portmanteau 
test for white noise coupled with the standard ^-approximation does not 
work well in this context. This is due to the fact that the (d + l)th largest 
eigenvalue K is effectively the extreme value of the estimates for all the 
zero-eigenvalues of K. Therefore, V'do+i ^ s n °t an estimate for a fixed direc- 
tion, which makes the ^-approximation for the Ljung-Box-Pierce statistic 
mathematically invalid. Indeed some simulation results, not reported here, 
indicate that the ^-approximation tends to underestimate the critical val- 
ues for the Ljung-Box-Pierce test in this particular context. 

3. Theoretical properties. Before presenting the asymptotic results, we 
first solidify some notation. Denote by (0j,ipj) and (0j,ipj) the (eigenvalue, 
eigenfunction) pairs of K and K, respectively [see (2.8) and (2.9)]. We al- 
ways arrange the eigenvalues in descending order, that is, 9j > 0j+i- As the 

eigenfunctions of K and K are unique only up to sign changes, in the sequel, 



10 N. BATHIA, Q. YAO AND F. ZIEGELMANN 

it will go without saying that the right versions are used. Furthermore, recall 
that 9j = for all j > d+ 1. Thus, the eigenfunctions tpj are not identified 
for j > d+ 1. We take this last point into consideration in our theory. We 
always assume that the dimension d > 1 is a fixed finite integer, and p > 1 
is also a fixed finite integer. 

For simplicity in the proofs, we suppose that E{Yt(-)} = /i(-) is known 
and thus set Y(-) = n(-). Straightforward adjustments to our arguments can 
be made when this is not the case. We denote by \\L\\s the Hilbert-Schmidt 
norm for any operator L; see Appendix A. Our asymptotic results are based 
on the following regularity conditions: 

CI. {Yf(-)} is strictly stationary and ■0-mixing with the mixing coeffi- 
cient defined as 

^(0= sup \1-P(B\A)/P(B)\, 

AeJ™, ,BeF°° ,P(A)P(B)>0 

where T\ = a{Yi(-), . . . , Yj(-)} for any j > i. In addition, it holds that ' x 
^ 1/2 (/)<oo. 

C2. E{j T Y t (uf du} 2 <oo. 

C3. 9\ > ■ ■ ■ > 9d > = 9d+i = ■ • • , that is, all the nonzero eigenvalues of 
K are different. 

C4. Cov{X s (u),e t (v)} = for all s,t and u,v£l. 

Theorem 1. Let conditions C1-C4 hold. Then asn^-oo, the following 
assertions hold: 

(i) \\K-K\\ s = O p (n- 1 / 2 ). 

(ii) For j = 1, . . . , d, \9j -9j\ = O p {n~ 1 / 2 ) and 

(jf " V<»} 2 dv) V2 = Opin" 1 ' 2 ). 

(iii) For j>d+l, 9j = O p {n~ l ). 

(iv) Let {tpj :j > d+ 1} be a complete orthonormal basis of M 1 - , and put 

oo 
i=d+l 

Then for any j > d + 1 , 

V2 , . ,i/ 2 



E(^,^)^(«)| =(^j$ j (u)-f J {u)} 2 du^ =O p {n- 1 ' 2 ) 



Remark 4. (a) In the above theorem, assertions (i) and (ii) are stan- 
dard. (In fact, those results still hold for d = oo.) 
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(b) Assertion (iv) implies that the estimated eigenfunctions tpd+j , j > lj 
are asymptotically in the orthogonal complement of the dynamic space Ai . 

(c) The fast convergence rate n in assertion (iii) deserves some further 
explanation. To this end, we consider a simple analogue: let A\, ... , A n be a 
sample of stationary random variables, and we are interested in estimat- 
ing fj? = (EAt) 2 for which we use the estimator A 2 = (n" 1 Ylt=\ ^t) 2 = 
n~ 2 t=i A s Af. Then under appropriate regularity conditions, it holds that 

(3.1) \A 2 - fi 2 \ < \fi\\A -fj,\ + \A 2 - Afi\ = \fi\ • O p {n~ 1/2 ) + Opin" 1 ) 

as \A — fi\ = O p (n~ 1 / 2 ) and \A 2 - Afj,\ = O p (n^ 1 ). The latter follows 
from a simple [/-statistic argument; see Lee (1990). It is easy to see 
from (3.1) that \A 2 - fi 2 \ = O p {n- 1 ' 2 ) if p ^ 0, and \A 2 - fi 2 \ = Opin" 1 ) if 

fj, = 0. In our context, the operator K = Y^k=i Ix^^( u ^ r )^k{v,r) = 
(n-p)- 2 Y%=1 E"7=i z ikZ* k (u, v), where Z tk (u, v) = {Y t (u) - [i(u)}{Y t+k (v) 
n(v)} and Zi k Z* k (u,v) = f x Zi k (u,r)Zj k (v,r) dr, is similar to A 2 , and hence 
the convergence properties stated in Theorem 1 (iii) [and also (ii)]. The fast 
convergence rate, which is termed as "super-consistent" in econometric lit- 
erature, is illustrated via simulation in Section 4.1 below; see Figures 4-7. 
It makes the identification of zero-eigenvalues easier; see Figure 1. 

With d known, let Ai = span{-0i(-) ; • ■ • ,ipd(')}, where ipi(-), ■ ■ ■ , V'd(') are 
the eigenfunctions of K corresponding to the d largest eigenvalues. In order 
to measure the discrepancy between M and Ai, we introduce the following 
metric. Let Afi and N% be any two d-dimensional subspaces of £2 (20- Let 
{Ca(0) • • • 1 0d(')} be an orthonormal basis of Mi, i 1 = 1, 2. Then the projec- 
tion of Cife onto A2 may be expressed as 

d 

^ &*>Cy(«)- 

Its squared norm is Ylj=i((C2j, Cik)) 2 < 1- The discrepancy measure is de- 
fined as 



(3.2) D(M U M2] 



i 



i-^E((C2,-,ci fe >) 2 - 
j,k=i 



It is clear that this is a symmetric measure between and 1. It is independent 
of the choice of the orthonormal bases used in the definition, and it equals 
if and only if Ai =A/2- Let Z be the set consisting of all the d-dimensional 
subspaces in €2(1)- Then (Z, D) forms a metric space in the sense that D is 
a well-defined distance measure on Z (see Lemma 4 in Appendix B below). 

Theorem 2. Let the conditions of Theorem 1 hold. Suppose that d is 
known. Then as n-> 00, it holds that D(A4,A1) = O p (n~ 1 / 2 ). 
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(a) d=2, averages of the 1st to the 10th largest eigenvalues 




2 4 6 8 10 

(b) d=4, averages of the 1st to the 10th largest eigenvalues 




2 4 6 8 10 



(c) d=6, averages of the 1st to the 10th largest eigenvalues 




2 4 6 8 10 

Fig. 1. The average estimated eigenvalues over the 200 replications with sample sizes 
n = 100 (solid lines), 300 (dotted lines) and 600 (dashed lines). 



Remark 5. Our estimation of A4 is asymptotically adaptive to d. To 
this end, let d be a consistent estimator of d in the sense that P(d = d) — > 1, 
and A4 = spanj^i, . . . be the estimator of A4 with d estimated by d. 
Since d may differ from d, we use the modified metric D, defined in (4.1) 
below, to measure the difference between A4 and M. Then it holds for any 
constant C > that 

P{n^ 2 \D(M,M) - D(M,M)\ > C} 
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<P{n l l 2 \D(M,M) -D{M,M)\ >C\d = d}P(d = d)+P(d^ d) 

< P{n l / 2 \D(M,M) ~ D(M,M)\ > C\d = d} + o(l). 

Note that when d = d, M = M and thus D(M,M) = D(M,M). Hence 
the conditional probability on the RHS of the above expression is 0. This 
together with Theorem 2 yield D(M, M) = O p {n~ 1 / 2 ). 

One such consistent estimator of d may be defined as d = i£{j:0j > e}, 
where e = e(n) > satisfies the conditions in Theorem 3 below. 

Theorem 3. Let the conditions of Theorem 1 hold. Let e — > and e 2 n — > 
oo and as n — > oo. Then P(d / d) — > 0. 

4. Numerical properties. 

4.1. Simulations. We illustrate the proposed method first using the sim- 
ulated data from model (1.1) with 

d 10 Z 

x t(u) =^2^tm(u), e t (u) = ^prCj(u), u G [0, 1], 

i=l j=l 

where {£ti,t > 1} is a linear AR(1) process with the coefficient (— 1)*(0.9 — 
0.5i/d), the innovations Z t j are independent ^(0,1) variables and 

ifi(u) = V2cos(niu), Cj( u ) = V / 2sin(7rjti). 

We set sample size n = 100, 300 or 600, and the dimension parameter d = 2, 4 
or 6. For each setting, we repeat the simulation 200 times. We use p = 5 in 
defining the operator K in (2.9). For each of the 200 samples, we replicate 
the bootstrap sampling 200 times. 

The average of the ordered eigenvalues of K obtained from the 200 repli- 
cations are plotted in Figure 1. For a good visual illustration, we only plot 
the ten largest eigenvalues. It is clear that drop from the dth largest eigen- 
value to the {d + l)st is very pronounced. Furthermore, the estimates for 
zero-eigenvalues with different sample size are much closer than those for 
nonzero eigenvalues. This evidence is in line with the different convergence 
rates presented in Theorem l(ii) and (iii). We apply the bootstrap method 
to test the hypothesis that the cith or the (d + l)st largest eigenvalue of 
K (9d and 8d+i, resp.) are 0. The results are summarized in Figure 2. The 
bootstrap test cannot reject the true null hypothesis #d+i = 0. The false null 
hypothesis 6d = is routinely rejected when n = 600 or 300; see Figure 2(a). 
However, the test does not work when the sample size is as small as 100. 

To measure the accuracy of the estimation for the factor loading space A4 , 
we need to modify the metric D defined in (3.2) first, as d may be different 
from d. Let A/i,A/2 be two subspaces in €2(1) with dimension d\ and d2, 
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(a) P-values for testing the d-th largest eigenvalue 



d=2 d=2 d=2 d=4 d=4 d=4 d=6 d=6 d=6 

n=100 n=300 n=600 n=100 n=300 n=600 n=100 n=300 n=600 

(b) P-values for testing the (d+1)-th largest eigenvalue 



d=2 
n=100 



d=2 
n=300 



d=2 
n=600 



d=4 
n=100 



d=4 
n=300 



d=4 
n=600 



d=6 
n=100 



d=6 d=6 
n=300 n=600 



Fig. 2. The boxplots of the P-values for the bootstrap tests of the hypothesis that (a) 
the dth largest eigenvalue of K is 0, and (b) the (d + l)th largest eigenvalue of K is 0. 
The horizontal lines mark the 1% (dotted line), 5% (solid lines) and 10% (dashed lines) 
significance levels, respectively. 



respectively. Let {Cn, ■ ■ ■ , ddi} be an orthonormal basis of Mi, i 
discrepancy measure between the two subspaces is denned as 



1,2. The 



(4.1) 



1 



di d 2 



^ m&x(di,d2) 



k=i j=i 



It can be shown that D(A/i,A/2) £ [0, 1]. It equals if and only if A/i = A/2, 
and 1 if and only if Wi -LA/2. Obviously, D(Af\, A/2) = D (A/i, A/2) when di = 
d2 = d. We computed D(M,M) in the 200 replications for each setting. 
Figure 3 presents the boxplots of those -D-values. It is noticeable that the 
D measure decreases as the sample size n increases. It is interesting to note 
too that the accuracy of the estimation is independent of the dimension d. 
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d=2 



d=4 



d=6 






n=100 n=300 n=600 n=100 n=300 n=600 n=100 n=300 n=600 

Fig. 3. The boxplots for the estimated error D defined in (4-1)- 

To further illustrate the different convergence rates in estimating nonzero 
and zero eigenvalues, as stated in Theorem 1, we generate 10,000 samples 
with different sample sizes from model (1.1) with d = 1, £t = 0.5^_i + rjt, 
where rj t ~ -^(0,1), <p{u) = V2cos(iru), and Et(-) is the same as above. In 
defining the operator K, we let p = 1. Then the operator K has only one 
nonzero eigenvalue 6 = 2. Figure 4 depicts the standardized histograms and 
the kernel density estimators of y/n{9\ — 6), computed from the 10,000 sam- 
ples. It is evident that those distributions resemble normal distributions 
when the sample size is 200 or greater. This is in line with Theorem l(ii) 

which implies that \/n(6\ — 6) converges to a nondegenerate distribution. 

Figure 5 displays the distribution of \fnQ2, noting 62 = 0. It is clear that 
y / ra^2 converges to zero as n increases, indicating the fact that the normalized 
factor ^/n is too small to stabilize the distribution. In contrast, Figure 6 
exhibits that the distribution of n02 stabilizes from the sample size as small 
as n = 50; see Theorem l(iii). In fact, the profile of the distribution with 
n = 10 looks almost the same as that with n = 2000. 

Figure 7 displays boxplots of the absolute estimation errors of the eigen- 
values. With the same sample size, the estimation errors for the nonzero 
eigenvalue are considerably greater that those for the zero eigenvalue. 

4.2. A real data example. To further illustrate the methodology devel- 
oped in this paper, we set upon the task of modeling the intraday return 
densities for the IBM stock in 2006. To this end, we have obtained the intra- 
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n=10 n=20 




d I 1 1 1 1 1 d I 1 1 1 1 1 

-40 -20 20 40 60 -40 -20 20 40 60 



n=50 n=100 




d i 1 1 1 1 1 d i 1 1 1 r 

-40 -20 20 40 60 -40 -20 20 40 



Fig. 4. Standardized histograms overlaid by kernel density estimators of y/n(9i — 0). 

day prices via the WRDS database. We only use prices between 09:30-16:00 
since the market is not particularly active outside of these times. There are 
n = 251 trading days in the sample and a total of 2,786,650 observations. 
The size of this dataset is 73.7 MB. 

Since high frequency prices are not equally spaced in time, we compute 
the returns using the prices at the so-called previous tick times in every 5 
minute intervals. More precisely, we set the sampling times at t\ = 09:35, 
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IT 



n=10 



n=20 




J 




~i r 

4 6 



n=50 



n=100 




n=200 



n=500 



n=1000 



n=2000 



Fig. 5. Standardized histograms overlaid by kernel density estimators of \fnQ2 



T2 = 09:40, . . . , T m = 16:00 with m = 78. Denote by Xi(tij) the stock price on 
the ith day at the time ty, j = 1,. . . ,m and i = 1, . . . ,n. The previous tick 
times on the ith day are defined as 



Ta = max{% :tij <Ti,j = l,...,m}, 1 = 1,. 



, m. 



The Ith return on the ith day is then defined as Zn = log{Xj(r^) / 'Xifcj-i)} . 
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n=10 n=20 





f 



n=50 




n=100 



n=200 




t 



n=500 



n=1000 



V' 



\ 



\ 



15 



FN 



n=2000 




Fig. 6. Standardized histograms overlaid by kernel density estimators of n02 



We then estimate the intraday return densities using the standard kernel 
method 



(4.2) 



Y i (u) = {mh i y 1 Y j K 



j'=i 



Z; 



hi 



l,...,n, 



where K(u) = (\/2vf)- 1 exp(-u 2 /2) is a Gaussian kernel and hi is a band- 
width. We set 1= [-0.002,0.002] as the support for Yi(-). Let d { be the 
sample standard deviation of {Zij,j = l,...,m} and hi = 1.06<7im -1 ' 5 be 
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(b) 9a 




Fig. 7. Boxplots of estimation errors: (a) Errors for nonzero eigenvalue \6\ — 9\; (b) .Er- 
rors for zero-eigenvalue 62- To add clarity to the display, the outliers are not plotted. 



Silverman's rule of thumb bandwidth choice for day i. Then for each i, we 
employ three levels of smoothness by setting hi in (4.2) equal to 0.5/ij, hi 
and 2hi. Figure 8 displays the observed densities for the first 8 days of the 
sample. 

To identify the finite dimensionality of Yt(-), we apply the methodology 
developed in this paper. We set p = 5 in (2.8). Figure 9 displays the estimated 
eigenvalues. With all three bandwidths used, the first two eigenvalues are 
much larger than the remaining ones. Furthermore, there is no clear cut- 
off from the third eigenvalue onwards. This suggests to take d = 2. The 
bootstrap tests, reported in Table 1, lend further support to this assertion. 
Indeed for all levels of smoothness adopted, the bootstrap test rejects the 
null Hq:02 = but cannot reject the hypothesis 9 a = for j = 3,4 or 5. 
Note that it is implied by #3 = that 63+k = for k > 1. Indeed, we tested 
#3 + fc = only for illustrative purposes. 

Table 2 contains the P-values from testing the hypothesis that the esti- 
mated loadings, fjtj in (2.11) are white noise using the Ljung-Box-Pierce 
portmanteau test. Although we should interpret the results of this test with 
caution (see Remark 3 in Section 2.2.3), they provide further evidence that 
there is a considerable amount of dynamic structure in the two-dimensional 
subspace corresponding to the first two eigenvalues B\ and 62, and there is 
little or none dynamic structure in the directions corresponding to 63 and 64. 
Collating all the relevant findings, we comfortably set d = 2 in our analysis. 
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t=5 



t=6 



-0.002 -0.001 0.000 0.001 0.002 -0.002 -0.001 0.000 0.001 0.002 



t=7 



t=8 



-0.002 -0.001 0.000 0.001 0.002 



-0.002 -0.001 0.000 0.001 0.002 



Fig. 8. Estimated densities, Yi(-), using bandwidths hi = hi (solid lines), 0.5/ii (dashed 
lines) and 2hi (dotted lines). 

10 largest eigenvalues 




2nd to 11th largest eigenvalues 




3rd to 12th largest eigenvalues 




Fig. 9. Estimated eigenvalues 6j using bandwidths h t = 0.5h t (solid lines), ht (dashed 
lines) and 2ht (dotted lines). 
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Table 1 

P -values from applying the bootstrap test in Section 2.2.3 
to the intraday return density example 









ht = 0.5h t 


h t = h t 


h = 2h t 


Ho 


:0i 


= 


0.00 


0.00 


0.00 


H 


:9a 


= 


0.00 


0.00 


0.00 


Ho 


(h 


= 


0.35 


0.15 


0.18 


Ho 


:6> 4 


= 


0.62 


0.73 


0.74 


Ho 


(h 


= 


0.68 


0.91 


0.93 



Figure 10 displays the first d{= 2) estimated eigenfunctions tpj in (2.13). 
Although the estimated curves If(-) in Figure 8 are somehow different for 
different bandwidths, the shape of the estimated eigenfunctions is insensitive 
to the choice of bandwidth. 

Figure 11 displays time series plots of the estimated loadings r}ti and 
rftj ■ Again the estimated loadings with three levels of bandwidth are almost 
indistinguishable from each other. Furthermore, the ACF and PACF of the 
series fj t j = (fyti,^)' are also virtually identical for all three choices of h. 
These graphics are displayed in Figures 12 and 13. 

We now fit a VAR model to the estimated loadings, fj t : 

T 

(4-3) rj t = J2 A kVt~k + e t , 

k=l 

Since the estimated loadings fj t j, as defined in (2.11), have mean zero by 
construction, there is no intercept term in the model. We choose the order r 
in (4.3) by minimizing the AIC. The AIC values for the order r = 0, 1, . . . , 10 
are given in Table 3. With all three bandwidths used, the AIC chooses r = 3, 
and the multivariate portmanteau test (with lag values 1, 3 and 5) of Li and 

Table 2 

P-values from testing the hypothesis Ho '■ fftj is white noise using the Ljung-Box-Pierce 
portmanteau test. The test statistic is given by Qj =n(n + 2)^2 q k=1 Sj(k) 2 /(n — k), 
where Sj(k) is the sample autocorrelation ofrjtj at lag k. Under Ho, Qj 
has an asymptotic \q- distribution 
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0.33 
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(a) fa (b) 1P2 




n 1 1 1 r" n 1 1 1 r" 

-0,002 -0.00T 0,000 0.001 0,002 -0,002 -0,001 0.000 0.001 002 

Fig. 10. Estimated eigenfunctions (a) i\>\ and (b) ^2 using bandwidths h t = 0.5/ij (solid 
lines), h t (dashed lines) and 2h t (dotted lines). 
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(b) m 
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Fig. 11. Estimated loadings (a) rjti and (b) rjt2 using bandwidths h t = 0.5ht (solid lines), 
ht (dashed lines) and 2ht (dotted lines). 
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Fig. 12. ACF offj t j using bandwidthsht — 0.5ht (solid lines), h t (dashed lines) and 2ht 
(dotted lines). 



McLeod (1981) for the residual of the fitted VAR models are insignificant 
at the 10% level. The Yule-Walker estimates of the parameter matrices, 
Afc = (ak,ij) in (4-3), with the order r = 3 are given in Table 4. 

To summarize, we found that the dynamic behavior of the IBM intraday 
return densities in 2006 was driven by two factors. These factors series are 
modeled well by a VAR(3) process. We note that with all the three levels of 
smoothness adopted in the initial density estimation, these conclusions were 
unchanged. 

Finally, we make a cautionary remark on the implied true curves Xt(-) 
in the above analysis. We take the unknown true daily densities as -Xt(-). 
We see those densities as random curves, as the distribution of the in- 
traday returns tomorrow depends on the distributions of today, yesterday 
and so on, but is not entirely determined by them. Now in model (1.1), 
E{e t (u)} = E{Y t (u)} — E{X t (u)} ^ 0. But this does not affect the analysis 
performed in identifying the dimensionality of the curves; see also Pan and 
Yao (2008). Note that (2.10) provides an alternative estimator for the true 
density Xt(-) based on the dynamic structure of the curve series. It can be 
used, for example, to forecast the density for tomorrow. However, an obvious 
normalization should be applied since we did not make use the constraint 
J x X t (u) du = 1 in constructing (2.10). 
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Fig. 13. PACF of r) t - using bandwidths ht = 0.5ht (solid lines), h t (dashed lines) and 
2ht (dotted lines). 

APPENDIX A 

In this section, we provide the relevant background on operator theory 
used in this work. More detailed accounts may be found in Dunford and 
Schwartz (1988). 

Let T~L be a real separable Hilbert space with respect to some inner product 
(•,•)■ For any V C H, the orthogonal complement of V is given by 

V ± = {x£'H:(x,y} = 0,Vy£ V}. 

Note that V - * --1 = V where V denotes the closure of V. Clearly, if V is finite 
dimensional then V^ 1 - = V. 

Let L be a linear operator from "H to T~L. For x € T~L, denote by Lx the 
image of x under L. The adjoint of L is denoted by L* and satisfies 

(A.l) (Lx,y) = (x,L*y), x.y^U. 

L is said to be self adjoint if L* = L and nonnegative definite if 

(Lx,x)>0 VxeH. 

The image and null space of L are defined as Im(L) = {y £H:y = Lx, x G H} 
and Ker(L) = {x (zH: Lx = 0}, respectively. Note that Ker(L*) = (Im(L))- 1 -, 
Ker(L) = (Im^*)) 1 - and Ker(L*) = Ker(LL*). We define the rank of L to 
be r(L) = dim(Im(L)) and we say that L is finite dimensional if r(L) < oo. 
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Table 3 

AIC values from fitting the VAR model in (4-3). The figures in this table have been 
centered at the minimum AIC value 







T = 


T = 1 


r = 2 


r = 3 


r = 4 


r = 5 




= 0.5% 


131.33 


40.39 


9.98 


0.00 


7.86 


10.38 


h t 


= % 


133.04 


41.32 


9.53 


0.00 


7.47 


10.08 


h t 


= 2% 


135.47 


40.83 


9.58 


0.00 


7.00 


8.94 



A linear operator L is said to be bounded if there exists some finite 
constant A > such that for all x £ T~L 

\\Lx\\ < A||x||, 

where || • || is the norm induced on % by (•,•)• We denote the space of 
bounded linear operators from T~L to % by B = B{7~L, T~L) and the uniform 
topology on B is defined by 

||L||s= sup ||£a;||, LgB. 

\\x\\<1 

Note that all bounded linear operators are continuous, and the converse also 
holds. 

An operator L E B is said to be compact if there exists two orthonormal 
sequences {ej} in {fj} of H and a sequence of scalar s {Xj} decreasing to 
zero such that 

oo 

Lx = ^2\ j (e j ,x)f j , xeH, 

3=1 

or more compactly 

oo 

(A.2) /• Y, X ''J J 'r 

3=1 

Table 4 





Estimated parameter 


matrices 


= (flfc.y) from 


fitting the VAR model in 


(4.3) 


3 
h t 




1 






2 




0.5% 


ht 


2% 


0.5% 


h t 


2h t 


ai.ij 


0.08 


0.07 


0.01 


-0.14 


-0.16 


-0.22 


ai,2i 


-0.08 


-0.05 


0.03 


0.24 


0.26 


0.33 


02, lj 


0.35 
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Note that if % = £2(2) equipped with the inner product defined in (2.1), then 

00 

(Lx)(u) = \j(ej,x) fj(u). 
j'=i 

Clearly, Im(L) = sp{/j : j > 1} and Ker(L) = sp{ej :j > 1} ± - 

The Hilbert-Schmidt norm of a compact linear operator L is defined as 
||L||s = (Y^jLi A 2 ) 1 / 2 . We will let S denote the space consisting of all the 
operators with a finite Hilbert-Schmidt or nuclear norm. Clearly, we have 
the inequalities ||-||s>||-||g, and thus the inclusions S C B. Note that B is 
a Banach space when equipped with their respective norms. Furthermore, 
S is a Hilbert space with respect to the inner product 

00 

(L 1 ,L 2 )s= ^2,{Ligi,hj)(L 2 gi,hj), L x ,L 2 eS, 

where {gi} and {hj} are any orthonormal bases of T~L. 

APPENDIX B 

In this section, we provide the proofs for the propositions in Section 2 and 
the theorems in Section 3. Throughout the proofs, we may use C to denote 
some (generic) positive and finite constant which may vary from line to line. 
We introduce some technical lemmas first. 

Lemma 1. Let L be a finite- dimensional operator such that for some 
sequences of orthonormal vectors {ej}, {fj}, {gj} and {hj} and some se- 
quences of decreasing scalars {9j} and {Xj}, L admits the spectral decompo- 
sitions L = Ylj=i @j e j ® fj = SjLi ^j9j ® hj ■ Then it holds that d' = d. 

Proof. Note that if d 7^ d' then both Im(L) and Im(L£) will be of 
different dimensions under the alternative characterizations due to linear 
independence of {ej}, {fj}, {gj} and {hj}. Thus, it must hold that d = d' . 
□ 

Lemma 2. Let L be a linear operator from % to %, where % is a sepa- 
rable Hilbert space. Then it holds that Im(LL*) = Im(L). 

Proof. Using the facts about inner product spaces and linear operators 
stated in Appendix A, we have 

lm(LL*) = (Im(LL*)) x± = (Im((LL*)*)) x± 
= (Ker(LL*)) ± = (Ker(L*)) ± 

= {ha.{L)) L1 - = Im(L), 
which concludes the proof. □ 
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For the sake of the simplicity in presentation of the proofs, we adopt 
the the standard notation for Hilbert spaces. For any / £ £2(1), we write 
11/11 = y/{f,f) [see (2.1)], and denote M k f € £ 2 (1) the image of / under the 
operator M k in the sense that 

(M k f)(u)= J M h (u,v)f(v)dv. 

The operators N k ,K, M k and K may be expressed in the same manner. Note 
now that the adjoint operator of M k is 

(M* k f)(u) = J^M k (v,u)f(v)dv. 

See (A.l). Furthermore, N k = M k M^ in the sense that N k f = M k M^f; 
see (2.6). Similarly, K = T, P k=1 M k M*; see (2.9). 

Proof of Proposition 1. (i) To save notational burden, we set k = k . 
We only need to show Im(N k ) = A4. Since N k = M k M k , it follows from 
Lemma 2 that lm(N k ) = Im(M k M£) = Im(M fc ) as N k and M k are finite 
dimensional and thus their images are closed. 

Now, recall from Section 2.1 that M k may be decomposed as 

d 

(B.l) M h =J2*W<p i ®<p j . 
See also (A. 2). Thus, from (B.l), we may write 



(B.2) M fe = ^AfVi®pf } 

i=l 

where 



(*),. d 



Plk ~ II v-d (*) n ' { 

IIEi=i4Vill 



(fc) 



From (B.l), it is clear that lm(M k ) C M, which is finite dimensional. Thus, 
M k is compact and therefore admits a spectral decomposition of the form 

(B.3) M k = J2^ k) 4 k) ®^ k) 

with (4>j ,ip{ ) forming the adjoint pair of singular functions of M k corre- 
sponding to the singular value 9j"\ Clearly, d k < d. Thus, if d k < d, lm(M k ) C 

Ai since from (B.3), lm(M k ) = span{0^ :j = 1, . . . ,d k } and any subset of 
d k < d linearly independent elements in a <i-dimensional space can only span 
a proper subset of the original space. 
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Now to complete the proof, we only need to show that the set of {p^} 
in (B.2) is linearly independent for some k. If this can be done, then we are 
in a position to apply Lemma 1. Let (3 be an arbitrary vector in M. d and put 

ip = (tpi, . . . , ipd)' and p k = (pi , . . . , p^)', then the linear independence of 

the set {/Of} can easily be seen as the equation 

(3p k = (3V k <p = 

has a nontrivial solution if and only if = 0. However, since is of full 
rank by assumption, it follows that it is invertible and the only solution is the 
trivial one (3 = 0. Thus, Lemma 1 implies dt = d and the result follows from 
noting that any linearly independent set of d elements in a <i-dimensional 
vector space forms a basis for that space. 

(ii) Similar to the proof of part (i) above, we only need to show Im(K) = 
M. Note that for any / G L 2 (X), (M k M*f, f) = (M*f, M*f) = \\M*f\\ 2 > 0, 
thus the composition N k = M k M k is nonnegative definite which implies that 
K is also nonnegative definite. Therefore, Im.(K) = U? =1 lm.{N k ). From here, 
the result given in part (i) of the proposition concludes the proof. □ 

Proof of Proposition 2. Let 9j be a nonzero eigenvalue of K*, and 
7j = (7ij, • • • ,7 n -pj)' be the corresponding eigenvector, that is, K*7j = 

~fjOj- Writing this equation component by component, we obtain that 

n-p p 
^ V> i,s=lk=l 

for t = 1, . . . , n — p; see (2.12). For tpj defined in (2.13), 

(Ktpj)(u) = J K{u,v)ijjj{v) dv 

n-p p 

E Ei^ («) " ~ Y ^i) 

x (Y t+k -Y,Y s+k -Y) 

£ ^{y t (u)-yH} 7ii (y s -y,ii-F) 



(n — p) s 



n—p p 



( n -P) 2 t , s , i=lk=1 

x (y m -y,y s+fc -y) ; 

see (2.9). Plugging (B.4) into the right-hand side of the above expression, 
we obtain that 

n—p 

t=i 

that is, ifij is an eigenfunction of K corresponding to the eigenvalue 8j. □ 
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As we shall see, the operator K = Yjk=i MkMZ may be written as a func- 
tional of empirical distributions of Hilbertian random variables. Thus, we 
require an auxiliary result to deal with this form of process. To this end, 
we extend the ^/-statistic results of Sen (1972) to the setting of Hilbertian 
valued random variables. Further details about F-statistics may be found 
in Lee (1990). 

Let H be a real separable Hilbert space with norm || • || generated by 
an inner product (■,•). Let X t S X be a sequence of strictly stationary and 
Hilbertian random variables whose distribution functions will be denoted 
by P(x),x G Ti. Note that the spaces X and H may differ. Let <p:X m —>H 
be Bochner integrable and symmetric in each of its m(>2) arguments. Now 
consider the functional 



,. m 
6(P)= / 0(xi,...,z m ) TT P(dxj-), 



defined over V = {P : ||#(P)|| < oo}. As an estimator of 9(P), consider the 
^-statistic defined by 

n n 

il=l !m=l 

Now for c = 0, 1, . . . , m, we define the functions 

„ m 
4> c (xi,...,x c )= (j)(xi,...,X c ,X c+ i,...,X m ) P((lXj) 



j=c+l 



and 



g c ( Xl ,. ..,x c ) = ]T(-ir d ■ ■ , X jd ). 

d=0 l<h<-<jd<c 

In order to construct the canonical decomposition of V n , we use Dirac's 
5- measure to define the empirical measure P n as follows: 

P n (A) = n~\5 Xl (A) H \-6x n (A)), A e X. 

Then for c = 1, . . . , m, we set 

Vnc = / 4> c (xi,...,x c )T\(P n (dxj) -P(dXj)) 

JXC 3=1 

n n 

• • • 5c(^ii,- • • , ^j, 



ii=l i c =l 

then we have 

m / \ 

(b.5) k - 6(p) =Eu r- 

c=l ^ ' 
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In particular, note that 

1 n 

V nl = -Y j g 1 {X i ). 

71 £- * 



n 
i=i 



Decomposition (B.5) is the Hoeffding representation of the statistic V n . It 
plays a central role in the proof of Lemma 3 below. We are now in a position 
to state some regularity conditions which form the basis of the result. 

• Al. {Xt} is strictly stationary and ^-mixing with ^-mixing coefficients 

satisfying the condition Yli=i ^ m -1 '0 1/ ' 2 (O < °°- 

• A2. J Xm \\ ( j){x 1 ,...,x rn )fY[f =1 P{dx j )<^. 

• A3. E\\g l (X 1 )\\ 2 + 2Y% =2 E{g 1 {X l ),g l X k )^0. 

Lemma 3. Let conditions Al— A3 hold. Then for c= l,...,m i< Zio/cfe 
t/iai ^HKcll 2 = 0{n~ c ). 

Proof. We make use of (B.5). Let {ej : j > 1} be an orthonormal basis 
of %. Then 

oo 

(B.6) E\\V nc \\ 2 = Y,E{e j: V nc )\ 

where (ej,V nc ) is the R valued ^-statistic 

n n 

(ej,V nc ) = n~ c ^2 ■■■^2( e j:9c{X il ,.. .,X ic )). 

ii=l i c =l 

Now under conditions A1-A3, Lemma 3.3 in Sen (1972) yields 
(B.7) E{e j ,V nc ) 2 <CrT c f { ej , c (xi, . . . ,x c )) 2 ff P{d Xj ) 

for all j > 1. Now inserting the estimate in (B.7) into (B.6) yields 

oo „ c 

E\\ Vnc || 2 <Cn~ c ^2 / <e )) 2 n^) 

<Cn" c / ||</ )c (x 1 ,...,x c )|| 2 nP(dx i ) 

i=i 

oo „ m 

<C7n" c ^ / \\</>(x 1 ,...,x m )\\ 2 l[P{dx j ) 



J=l " A 3=1 

0{n~ 



as required. □ 
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Proof of Theorem 1. (i) Since p is fixed and finite, we may set n = 
n—p. Let Z t k = (Yt — /i) <8> (Yt+k — /■*) £ S. Now consider the kernel p : S x 5 — > 
S given by 

(B.8) p(A,B) = AB*, A,BeS. 

Now note that from (B.8), 

n n 

M k M k = n^ 2 ^2Yl P( Zik > 

i=l j=l 

which in light of the preceding discussion is simply a S valued von Mises 
functional. Then d > 1 it holds that M k ^ 0, an application of Lemma 3 
yields 

(B.9) E\\M k M* k - M k M* k \\% = 0{n~ l ). 

Note that if d = 0, the rate in (B.9) would be n~ 2 , that is, the kernel p 
would possess the property of first order degeneracy. Now by (B.9) and the 
Chebyshev inequality, we have 

v 

\\K - K\\ s < \\M k M k - M k M* k \\ s = O p {n~ l l 2 ). 

k=l 

(ii) Given \\K — K\\$ = O p (n -1 / 2 ), Lemma 4.2 in Bosq (2000) implies the 

su Pj>i \@j ~ %l — 11-^ ~~ K\\s = O p (n -1 / 2 ). Condition C3 ensures that ipj is 
an identifiable statistical jparameter iorj = 1, . . . ,d. From here, Lemma 4.3 
in Bosq (2000) implies - ifjj\\ < C\\K - K\\ s = Op^" 1 / 2 ). 

(iii) First, note that by Lemma 3 we have 

(B.10) E\\M k M* k -M k M* k ||| = 0(n- 2 ). 

Put K = J2k=iMkMk- Then by (B.10) and the Chebyshev inequality, we 
have 

p 

(B.ll) \\K - K\\ s < Am - M k M* k \\ s = O p {n~ l ). 

k=i 

The estimate in (B.ll) will prove to be crucial in deriving the results for 9j 
when j > d+1. 

Now, extend ipi, . . . to a complete orthonormal basis of T~L. Then it 
holds that 

n oo 

(B.12) E% = X>i,%>, 

2=1 J=l 

and by recalling that 8j = for j > d 

d d 

(B.13) I> = I>; 5 ^i>- 

3=1 j=l 
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Note that span{ifjj :j>d}= Ai^ and Kipj = for all j > d since Ker(K) = 
M L . Thus, from (B.12) and (B.13), we have 

71 OO 

(B.14) Yl % " °i = I>* " 

3=1 3=1 

Now we will show that 
(B.15) dj-e^i^^k-K^+Opin- 1 ), j = l,..., d. 

Let Kj = (t/jj, (K — K)ipj). Then using the relations Ktpj = Ojipj and Kipj = 
Ojipj along with the fact that K is self adjoint, we have 

\ Kj _ @. _ 6j )\ = \Wj,k$j) - - 0j - 0j)\ 

(B.16) =|(%-e i )((^,fe-l)| 

= |%-^IKVi,fe-i|- 

Note that 

(B.17) |(^-,ft) - 1| = 1(^,^-^)1 < Wlllft -^11 = lift - V# 

Thus, from the results in (b) above (B.16) and (B.17), we have \Kj — {9j — 
9j)\ < \0j - 6j\ ||ft - ^ || = O^n- 1 ) for j = 1, . . . , d. 
Next, we have 

l(V>j, - ~Kj\ = \(ipj - ft, (£ - K)^)\ 

<||^-ft||||(K-^ll 

<||^-ft||||^-K|| 5) 

from which the results in (i) and (ii) | {ipj, (K — K)ipj) — Kj\ = O p {n~ l ), thus 
proving (B.15). 

Now from (B.15) we have 

d d 

E^ - °j = E^ - k Wj) + °p( n ^)' 

3=1 3=1 

and thus from (B.ll) and (B.14) 

n oo 

E E (^■>(^-^)+O p (n- 1 ) 

oo 

= ]T (^(^-ir^ + o^n- 1 ). 
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By noting that tpj G M ± for j > d + 1 and Ker(M fc ) = Ker(K) = Kex(K) = 
M ± , it holds that E7=d+Mi (K - K)^j) = 0. Thus, E]=d+i &j = 
and the result follows from noting that 8i < Y^Jj=d+i ®j f° r i = 1, ■ ■ ■ ,d. 

(iv) Let 11^ and II^x denote the projection operators onto Ai and A4 ± , 
respectively. Since x = H_m(x) + H m a_(x) for any x G £-2(2), we have 

d 

(B.18) l|iM^)|| 2 = \\A - iW(^)|| 2 = Ys&i^i) 2 

3=1 

for all i > 1 . Now note that for i > d + 1 

||^)|| = \\{K-k)$ i ) + i> i e i \\ 

(B-19) <\\{K-k)$i)\\ + \0 i \\$i\\ 

< 2\\K-K\\ B , 

where the final inequality follows from the definition of || • and Lemma 4.2 
in Bosq (2000) by noting that 0* = for alH > d+ 1. 
Next, we have for i > d + 1 

00 00 

3=1 3=1 

(B.20) 

d d 

= ^29](^ j ) 2 >e 2 d J2(^3) 2 ^ 

3=1 3=1 

since X > ■ ■ ■ > 9 d . Combining (B.18), (B.19) and (B.20) yields 

l|n^(^ 0+ i)|| 2 = - n^x(^ 0+1 )|| 2 < c\\k - k\\ B , 

from which (i) yields the result. □ 

Lemma 4. The function D defined in (3.2) is a well-defined distance 
measure on Zp. 

Proof. Nonnegativity, symmetry and the identity of indiscernibles are 
obvious. It only remains to prove the subadditivity property. For any L £S, 
note that \\L\\s = y / tr(L*L), where tr denotes the trace operator. Now, 
for any Xi G Z, i = 1,2,3, let IL^ denote its corresponding ti-dimensional 
projection operators defined as follows: 

d 

3=1 
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where {Qj :j = l,...,d} is some orthonormal basis of X^. Now the triangle 
inequality for the Hilbert-Schmidt norm yields 

lin^i - n^Hs < yn^-j — iTvjs + ||iLv 2 - n^n^. 

Since the projection operators are self adjoint, we have 



tr(n5, 1 ) + tr(n2, s )-2tr(n A;i n A 3 J ) 
< Jtr(n^ i ) + tr(n^)-2tr(n Ari n^ 2 ) 



+ ^tr(Il2, ) + tr(rpy - 2tr{H X2 n Xz ). 

Nowtr(Il2,) =tr(n Xi ) = d and tr(n*.IL*,.) = Efc,i=i(C*fc, Cjl) 2 for i, j = 1, 2, 3. 
These last facts along with the definition of D in (3.2) give 

D(X U X 3 ) < D(X 1 ,X 2 ) + D(X 2 ,X 3 ), 

which concludes the proof. □ 

PROOF of Theorem 2. Prom the definition of D in (3.2), note that 
(B.21) V2dD(M,M) = \\U^-U M \\ s , 

where 11^ = Yfj=i i'j ® an d Hm = Y^j=i 4>j ® <Pj with <pi , . . . , <pd forming 
any orthonormal basis of A4. Now if IT^ and IT^ are any projection op- 
erators onto M, then by virtue of Lemma 4 it holds that HIL^ — II^Hs = 
V2dD(M,M) = 0. Thus, we may proceed as if H_m in (B.21) was formed 
with eigenfunctions of K, that is, <pj = ipj for j = 1, . . . , d. 
Now, we have 



(B.22) 



^ft®ft-^^®V>j 
3=1 3=1 



< ^2 nft ® ft - ipj ® ^ lis, 

i=i 



that is, ipj <X> (resp., ® Vj) is the projection operator onto the eigensub- 
space generated by 6^ (resp., Now by part (i) of Theorem 1, \\K — K\\s = 
Op(n~ 1 / 2 ). Thus, Theorem 2.2 in Mas and Menneteau (2003) implies that 
\\ipj Cg> ipj — ipj (8) ipj\\s = Op( n_1 ^ 2 ) fo r j = This last fact along 

with (B.21) and (B.22) yield D(M,M) = O p {n~ l l 2 ). □ 

Proof of Theorem 3. We first note that from (B.9), the triangle 
inequality and the c r inequality, we have 

(B.23) E\\K-Kf s = 0{n~ 1 ). 
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As d\ > 62 > • • • > (with strict inequality holding with probability one), it 
holds that {d > d} = {6d+i > e}. Now since 6*^+1 = 0, it holds that 9 d+ \ = 
\0 d+ i - 9d+i\ < \\K - K\\s by Lemma 4.2 in Bosq (2000). Collecting these 
last few facts and applying the Chebyshev inequality yields 

(B.24) P(d > d) < e~ 2 E\\K - K\\% = 0((e 2 n) _1 ) 

by (B.23). Next, we turn to P(d < d). Due to the ordering of the eigenvalues, 
it holds that {d <d} = {6d-i < e}. Therefore, 

P(d <d) = P(0 d _i < e) 

= P(9 d -i - 9 d -i > e d -i - e) 

(B.25) 

<P{\0 d -i-0 d -i\>e d „ l -e) 

<P(\\K-K\\ s >e d ^-e), 

where the final inequality follows from Lemma 4.2 in Bosq (2000). Now since 
0d-l > an d e — > as n — )• 00, it holds that Q d -\ — e > for large enough n. 
Thus, by (B.24) and an application of the Chebyshev inequality to (B.23), 
we have 

(B.26) P(d < d) < - e)~ 2 E\\K - K\\% = 0{n~ l ). 

From (B.24) and (B.25), it follows that 

P(d^ d) = P(d< d) + P(d > d) = 0((e 2 n) _1 ) -»• 0. 
This completes the proof. □ 
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